IFN619 - Data Analytics for Strategic Decision Makers (2023_sem1)

IFN619 :: Assignment 2 :: Extending Analytics (40%)¶

IMPORTANT: Refer to the instructions in Canvas module About the Assessment - Assignment 2 BEFORE working on this assignment.

  1. Complete and run the code cell below to display your name, student number, and assignment option
  2. Delete the cells for the option that you are not using
  3. Complete a full analysis for your chosen option. Ensure that you include:
    • documentation of your thinking and decision-making
    • clear articulation of ethical considerations and human factors in both your analysis choices and your insights
    • at least one extended analysis approach (unstructured data analytics, ehanced visualisation, machine learning techniques)
  4. Submit your final notebook on Canvas Assignment 2 submission page
In [1]:
# Complete the following cell with your details and run to produce your personalised header for this assignment

from IPython.display import HTML

# personal details
first_name = "Vishnu"
last_name = "Shaji"
student_number = "N11407697"
# assignment option
option_choice = "own choice of analysis" # either "extending assign 1" OR "own choice of analysis"

personal_header = f"<h1>{first_name} {last_name} ({student_number}) :: Option: {option_choice}</h1>"
HTML(personal_header)
Out[1]:

Vishnu Shaji (N11407697) :: Option: own choice of analysis


Option 1 - extending assign 1¶

Extend your assignment 1 analysis by including consideration of ethical and human factors, including 1 extended analysis technique, and using at least 1 additional data source. Possible additional data sources:

  1. Guardian API (See accessing the Guardian API notebook)
  2. Hospital location data (Provided, see data folder)

NOTE: you should not repeat the analysis from assignment 1, but you may need to save dataframes from assignment 1 and reload for use in this assignment. You may also summarise your assignment 1 insights as part of the process of identifying questions for analysis.


Option 2 - own choice of analysis¶

You may choose your own scenario and your own data source. However, you must provide a link to the source of your data, and a good description of it. You should also provide a detailed description of your scenario and why your chosen question/s is/are important.

Scenario: Brisbane City Road Crash Data Analytics Study¶

Road safety is a crucial concern for the Queensland government, as traffic accidents not only cause significant loss of life but also lead to substantial economic and social costs. Recognizing the need for evidence-based strategies to improve road safety. This study aims to analyze and gain insights from the extensive dataset of road crash incidents in the state of Queensland, Australia. By leveraging advanced data analytics techniques, the study intends to identify patterns, trends, and factors contributing to road crashes happening in Brisbane. The findings from this study will aid in formulating effective strategies for road safety improvements and reducing the number of accidents on Brisbane City Roads.

QUESTIONS¶

1. What are the major hotspots for road crashes in and around Brisbane City?¶

Why is it important? - Through the Brisbane City Road Crash Data Analytics Study, several major hotspots for road crashes in and around Brisbane City can be identified. These hotspots refer to specific locations or areas where a high frequency of road accidents occurs. This study utilizes spatial analysis techniques to visualize the geographical distribution of crashes and identify these hotspots.

2. How severe are the casualities reported in these hotspots? 3. What might be the key factors that influence severity?¶

Why is it important? - The study can also be extended to analyze the severity of casualties reported in these hotspots. Severity refers to the degree of harm or injury suffered by individuals involved in road accidents. Understanding the severity of casualties is crucial for assessing the impact of accidents and identifying areas where the risk to life and well-being is particularly high.

Ethical considerations¶

  1. Injury Prevention and Public Safety: Understanding the major hotspots for road crashes in and around Brisbane City is crucial to keeping people safe and preventing injuries. By figuring out where these hotspots are, authorities can focus their efforts and resources on implementing specific measures to make those areas safer. This could involve improving road infrastructure, enhancing safety measures, and increasing law enforcement presence to reduce the chances of accidents happening.

  2. Resource Allocation and Emergency Response: Knowing the severity of casualties and what factors contribute to the seriousness of injuries helps emergency services respond more effectively. It allows them to allocate their resources in a way that matches the needs of each situation. For instance, if they have a better understanding of the factors that influence the severity of injuries, they can make sure they send the right medical resources and personnel to the areas most affected. It also helps them evaluate their current emergency response systems and make any necessary improvements to better handle accidents in the future.

  3. Policy Development and Traffic Management: Identifying the key factors that influence the severity of accidents provides valuable insights for policymakers and those responsible for managing traffic. This information helps them create well-informed policies and strategies based on solid evidence. By understanding these factors, they can implement measures like adjusting speed limits, improving road signs, implementing traffic calming techniques, and launching educational campaigns to address the specific issues contributing to severe accidents.

  4. Privacy and Confidentiality: When analyzing data for road crash studies, it's essential to respect the privacy and confidentiality of the individuals involved in accidents. This means ensuring that all data is anonymized and aggregated to protect the identities of the victims. By following privacy regulations and best practices, the study can maintain the trust and confidence of the community, which is crucial for obtaining accurate data and achieving meaningful results.

  5. Bias and Fairness: It's important to conduct the analysis in a fair and unbiased manner, without favoring any particular groups or regions. Every effort should be made to avoid discriminatory practices or unfair profiling when identifying hotspots and factors influencing severity. By ensuring fairness, the study can provide a more accurate and representative picture of the road crash situation, leading to more effective interventions.

  6. Responsible Data Usage: The data collected for the study should be handled responsibly and used solely for the purpose of improving road safety. This means securely storing and protecting the data to prevent unauthorized access or misuse. By treating the data with care and adhering to ethical standards, the study can generate meaningful insights while maintaining the integrity and privacy of the individuals involved.

THE DATA¶

The dataset available at the provided link (https://www.data.qld.gov.au/dataset/crash-data-from-queensland-roads/resource/e88943c0-5968-4972-a15f-38e120d72ec0) contains crash data from Queensland roads. It serves as a comprehensive and valuable resource for understanding and analyzing road accidents in the region. The dataset encompasses a wide range of information related to crashes, including location, date and time, weather conditions, road surface type, vehicle types, driver demographics, contributing factors, and severity of casualties. This rich dataset allows researchers, analysts, and policymakers to explore and uncover patterns, trends, and factors associated with road crashes in Queensland.

A subset of the data-set is utilized for this study. With this subset, The Brisbane City Council and other stakeholders can derive meaningful insights to inform evidence-based strategies aimed at improving road safety, reducing accidents, and minimizing the severity of casualties in the region. The availability of this dataset promotes transparency, facilitates data-driven decision-making, and supports the development of targeted interventions to enhance road safety outcomes in the area of study.

Ethical considerations about the data¶

When working with the data from Queensland roads, it is essential to consider various ethical considerations to ensure responsible and informed data analysis. Here are some key ethical considerations taken into account.

  1. Data Legality and Consent:The dataset is be obtained through lawful means and with appropriate consent from individuals involved in the accidents. There are no personal information that may be breach the confidentiality of individuals. Therefore, the use of the crash data comply with relevant legal frameworks and privacy regulations.

  2. Data Accuracy and Integrity: Maintaining the accuracy and integrity of the crash data is key.As a Data analyst, its important to ensure that the information used for analysis is reliable, valid, and free from errors or biases. Careful attention should be paid to data cleansing, quality control, and verification processes to minimize inaccuracies that could lead to misleading conclusions or decisions.

  3. Data Security and Confidentiality: Access to the data should is opened to the public by the Queesland Government, with little or no breach of confidentiality.

PREPARING THE DATA¶

Before preparing the data, lets import the required libraries to conduct the study

In [2]:
# Importing the basic pandas library to access the data.
import pandas as pd
import plotly.express as px
import warnings

# Turn off the SettingWithCopyWarning
warnings.filterwarnings("ignore", category=pd.core.common.SettingWithCopyWarning)

# Importing the Machine Learning Libraries to perform Clustering
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import LabelEncoder
from sklearn.cluster import KMeans

A subset of the Queesnland Road Crash data is made to filter out the crashes in Brisbane City by year 2019-2021, This is done by filtering out the 'Loc_Local_Government_Area' to Brisbane City, Which make sense because the concerned stakeholder here is The Brisbane City Council.

In [3]:
# Load the data
data = pd.read_csv('data/qld_crash_locations.csv')

# Filter the data to include only the desired years and accidents in Brisbane City
years  = [2019,2020,2021]
brisbane_crash_data = data[data['Crash_Year'].isin(years)]
brisbane_crash_data = brisbane_crash_data[brisbane_crash_data['Loc_Local_Government_Area']=='Brisbane City']
len(brisbane_crash_data)
Out[3]:
10029

To answer the questions, it is important to select relevant columns from the dataset that can provide insights into crash severity and location-based factors. We can refine the dataset that includes columns necessary for the requried analysis.

In [4]:
# Define the columns to keep in the refined dataset
columns_to_keep = [
    'Crash_Severity',
    'Crash_Year',
    'Crash_Month',
    'Crash_Day_Of_Week',
    'Crash_Hour',
    'Crash_Nature',
    'Crash_Type',
    'Crash_Longitude',
    'Crash_Latitude',
    'Crash_Street',
    'Crash_Street_Intersecting',
    'Loc_Suburb',
    'Loc_ABS_Statistical_Area_3',
    'Crash_Roadway_Feature',
    'Crash_Traffic_Control',
    'Crash_Road_Surface_Condition',
    'Crash_Speed_Limit',
    'Crash_Atmospheric_Condition',
    'Crash_Lighting_Condition',
    'Crash_Road_Horiz_Align',
    'Crash_Road_Vert_Align',
    'Crash_DCA_Code',
    'Crash_DCA_Description',
]

# Group data by area to get location based results
crash_by_area = brisbane_crash_data[columns_to_keep]
crash_by_area.shape
Out[4]:
(10029, 23)

Here we have cut down a significant amount of variables from the dataset, while, the data may be important to know other factors about the crash, its not required for this analysis.

Lets conitnue by searching for any missing values.

In [5]:
crash_by_area.isna().sum()
Out[5]:
Crash_Severity                     0
Crash_Year                         0
Crash_Month                        0
Crash_Day_Of_Week                  0
Crash_Hour                         0
Crash_Nature                       0
Crash_Type                         0
Crash_Longitude                    0
Crash_Latitude                     0
Crash_Street                       0
Crash_Street_Intersecting       4964
Loc_Suburb                         0
Loc_ABS_Statistical_Area_3         0
Crash_Roadway_Feature              0
Crash_Traffic_Control              0
Crash_Road_Surface_Condition       0
Crash_Speed_Limit                  0
Crash_Atmospheric_Condition        0
Crash_Lighting_Condition           0
Crash_Road_Horiz_Align             0
Crash_Road_Vert_Align              0
Crash_DCA_Code                     0
Crash_DCA_Description              0
dtype: int64

A relativley large portion of the data missing from Crash_Street_Intersecting. Since, the values in the column are geographical location, imputing them can be challenging without additional information and is out of scope for the study, so we can drop this column

In [6]:
crash_by_area.drop("Crash_Street_Intersecting", axis=1, inplace=True)

ANALYZING¶

Let us see some descriptive statistics of our data, to understand a bit about the shape of the data

In [7]:
crash_by_area[['Crash_Severity','Crash_Nature','Crash_Type']].describe()
Out[7]:
Crash_Severity Crash_Nature Crash_Type
count 10029 10029 10029
unique 4 12 4
top Medical treatment Rear-end Multi-Vehicle
freq 4739 3713 7721

In terms of "Crash_Severity," there are four unique categories, suggesting that crashes vary in their severity. The most frequently occurring severity level is "Medical treatment," which appears 4,739 times in the dataset.

Moving on to "Crash_Nature," we observe 12 unique categories that describe the nature or cause of the crashes. The most common nature of crashes is "Rear-end," which accounts for 3,713 occurrences in the dataset.

Regarding "Crash_Type," there are four distinct types of crashes. The most prevalent type is "Multi-Vehicle," which appears 7,721 times.

VISUALIZE¶

Lets visualise the data to get to gain a comprehensive understanding of the dataset and the nature of the crashes

In [8]:
# Visualize the dataset by grouping the Crash_Severity column
fig = px.histogram(crash_by_area, x='Crash_Severity',text_auto=True)

# Customize the plot if needed (e.g., labels, title, etc.)
fig.update_layout(
    title='Distribution of Crash Severity in Brisbane City',
    xaxis_title='Severity of Crashes',
    yaxis_title='Number of Crashes',
)

fig.update_layout(
    title_font_size=25,
    title_x=0.5,
)

fig.update_yaxes(range=[0, 10029])


# Show the plot
fig.show()

According to the source of the data, the casuality is based four levels of severity:

  1. Minor Injury
  2. Medical Treatment
  3. Hosptialisation
  4. Fatal

Here we can see that Most of the crashes in Brisbane City ends up as a Medical Treatements and Hospitalisations respectively. This is a big concern for BCC, as these are factors that contribute to the public health.

Although these numbers are concerning, there is little information on how it is distributed around Brisbane City, To gain an understanding around the distribution of these crashes, we can utilise the latiude and longitude values of the crash data to dispaly it in a map.

The Visualisation will be based on various Satistical Area in and around Brisbane City, their severity and the count.

Before visualising the concentration of categorical variables such as severity, its important to conver them and scale them to appropriate ranges to get meaningful visuals.

Lets start by converting and scaling 'Case_Severity'

In [9]:
# Mapping dictionary for severity conversion
severity_mapping = {
    'Fatal': 4,
    'Hospitalisation': 3,
    'Medical treatment': 2,
    'Minor injury': 1,
}

# Mapping to dataframe and saving to a new col
crash_by_area['Crash_Severity_n'] = crash_by_area['Crash_Severity'].map(severity_mapping)

# Create an instance of MinMaxScaler
scaler = MinMaxScaler(feature_range=(0, 1))

# Scale the 'Crash_Severity_Numerical' column
crash_by_area['Crash_Severity_n'] = scaler.fit_transform(crash_by_area[['Crash_Severity_n']])

# Verify the scaled values
table_1 = pd.DataFrame(crash_by_area[['Crash_Severity', 'Crash_Severity_n']].value_counts()).reset_index()
table_1
Out[9]:
Crash_Severity Crash_Severity_n 0
0 Medical treatment 0.333333 4739
1 Hospitalisation 0.666667 3565
2 Minor injury 0.000000 1665
3 Fatal 1.000000 60

Now, Lets group the data based on Statistical Area 3. This particular variable is chosen because it provides a good amount of abstraction in terms of Area of crashes, This is important as too much or too little abstraction can provide misinformation.

In [10]:
# Assuming your data is stored in a pandas DataFrame called 'df'
grouped_data = crash_by_area.groupby('Loc_ABS_Statistical_Area_3').agg({'Crash_Severity_n': 'mean',
                                                             'Crash_Latitude': 'mean',
                                                             'Crash_Longitude': 'mean',
                                                             'Crash_Year': 'count'}).reset_index()

# Rename the count column
grouped_data.rename(columns={'Crash_Year': 'Crash_Counts','Loc_ABS_Statistical_Area_3':'Statistical_Area'}, inplace=True)
grouped_data
Out[10]:
Statistical_Area Crash_Severity_n Crash_Latitude Crash_Longitude Crash_Counts
0 Bald Hills - Everton Park 0.421751 -27.350625 153.004009 377
1 Brisbane Inner 0.378512 -27.469342 153.026445 1210
2 Brisbane Inner - East 0.400440 -27.473845 153.059584 303
3 Brisbane Inner - North 0.400669 -27.434630 153.036974 797
4 Brisbane Inner - West 0.423387 -27.465511 152.995538 496
5 Browns Plains 0.606061 -27.639549 152.966813 11
6 Capalaba 0.416667 -27.507678 153.154678 140
7 Carindale 0.394558 -27.494605 153.100171 294
8 Centenary 0.375622 -27.546882 152.940062 134
9 Chermside 0.390507 -27.388664 153.028098 618
10 Forest Lake - Oxley 0.420091 -27.589706 152.960436 657
11 Holland Park - Yeronga 0.413952 -27.503698 153.041940 798
12 Ipswich Hinterland 0.444444 -27.356649 152.768145 3
13 Ipswich Inner 0.435897 -27.540025 152.796293 13
14 Kenmore - Brookfield - Moggill 0.423892 -27.520401 152.927136 173
15 Mt Gravatt 0.388391 -27.560154 153.095168 781
16 Nathan 0.387863 -27.549274 153.039161 379
17 Nundah 0.378546 -27.386459 153.073590 376
18 Redcliffe 0.444444 -27.145570 153.397687 3
19 Rocklea - Acacia Ridge 0.397348 -27.607349 153.030299 729
20 Sandgate 0.421627 -27.326524 153.047223 336
21 Sherwood - Indooroopilly 0.392491 -27.506987 152.979278 293
22 Sunnybank 0.369681 -27.595919 153.063174 376
23 The Gap - Enoggera 0.442379 -27.426193 152.959814 269
24 The Hills District 0.500000 -27.349325 152.771997 2
25 Wynnum - Manly 0.412148 -27.459631 153.144465 461

We have 25 Areas with their aggregrated Geo Location. This is information is divided well, as it gives information about area, rather, than suburb or street, which might cause us to possibly oversee important details or patterns by too much information crammed in one map.

We can now go about visualzing the data

In [11]:
# Plot the points on a map using plotly.express
fig = px.scatter_mapbox(grouped_data, lat='Crash_Latitude', lon='Crash_Longitude', hover_name='Statistical_Area',
                        size='Crash_Counts', color='Crash_Severity_n',
                        color_continuous_scale=['red','black'], size_max=25, zoom=9,
                       template="plotly")

# Customize the map layout
fig.update_layout(mapbox_style='open-street-map')
fig.update_layout(title='Number of Road Crashes by Statistical Area', height=600, width=800)
fig.update_layout(
    title_font_size=23,
)

# Display the map
fig.show()

Note: Here the szie of the plots are depending on the total count of the crashes and the red-black scale is used such that the average severity of an area can be easily identified. The choice of red is to chosen accodring to the color theory, to provoke alertness when visualising

Few takes from the Visualisations:¶

  1. It can be seen clearly from the map that most of the cases happens in Brisbane Inner, followed by Holland Park - Yerongera, Brisbane Inner North, Mt Gravatt.
  2. There is a general trend that cases tend to be more severe outside the city.
  3. Although the location of the points might be not accurate, It can be seen that most of the sever cases occur around a motorway or a highway

Expanding on the ideas- What would be the reason?, How can this information help the concerned stakeholder?¶

Upon analyzing the visualizations, a few key insights emerge. Firstly, it becomes evident that the majority of crashes occur in Brisbane Inner, specifically in and around the bustling Brisbane CBD. This concentration of incidents points to a higher risk in densely populated areas, which is likely influenced by the hectic nature of city life, especially during office hours, probably when the rush is at its peak.

The second observation is the general trend indicating that cases outside the city tend to be more severe. Several factors might contribute to this phenomenon, including higher speed limits on rural roads, less stringent traffic enforcement measures, or longer emergency response times. The combination of these factors outside the city can potentially increase the severity of crashes, highlighting the need for targeted interventions in rural areas to improve road safety.

Furthermore, while acknowledging the uncertainty surrounding the accuracy of the location points, it is interesting to note that severe cases tend to occur in proximity to motorways or highways. This correlation suggests a possible connection between the higher-speed nature of these roadways and the severity of crashes. Exploring this relationship further can provide valuable insights into how road type influences crash outcomes and may guide the implementation of measures such as enhanced signage, adjusted speed limits, or improved safety features along these routes.

Takeaway:¶

Further study is warranted to gain a deeper understanding of these patterns and explore potential causes. It would be beneficial to investigate the specific factors contributing to the concentration of crashes in Brisbane Inner, such as traffic congestion, road design, or driver behavior.

Note - Is it Ethical?¶

Before Conducting further study, we have to take considerations of the ethical factors. On the ethical side, the conclusion can be considered right due to the following reasons:

  1. Public Safety: Understanding the patterns and factors influencing road crashes can contribute to the development of targeted interventions and policies aimed at improving road safety. By identifying areas with a higher concentration of crashes and higher severity, resources can be allocated more effectively to implement measures that reduce the risk of accidents and protect public safety.

  2. Evidence-Based Decision Making: The conclusion is based on visualizations and data analysis, which can provide a factual and evidence-based foundation for decision making. Policymakers, city planners, and road safety advocates can utilize these findings to inform their actions and allocate resources more efficiently to address specific areas or issues.

However, it is also important to the limitations and potential ethical concerns associated with the conclusion.

Overseeing the locations with more severe cases may result in a disproportionate allocation of resources and interventions, potentially neglecting other areas that may also require attention. It is crucial to consider a holistic approach to road safety that takes into account factors such as socioeconomic disparities, driver behavior, and environmental conditions, rather than solely focusing on geographical areas with high severity rates.

Taking a deeper look into the city¶

Lets study the crashes in 'Brisbane - Inner' to gain an understanding of what is happening in the major accident hotspot of Brisbane City

In [12]:
crashes_cbd = crash_by_area[crash_by_area['Loc_ABS_Statistical_Area_3']=='Brisbane Inner']
In [13]:
crashes_cbd['Crash_Severity'].describe()
Out[13]:
count                  1210
unique                    4
top       Medical treatment
freq                    589
Name: Crash_Severity, dtype: object

Based on the descriptive analysis of the "Crash_Severity" variable in Brisbane Inner, there are a total of 1,210 records of crashes. The most frequent severity level is "Medical treatment" with a frequency of 589 occurrences.

Let us visualise these crashes in the map by aggregating the location by street level

In [14]:
# Assuming your data is stored in a pandas DataFrame called 'df'
grouped_data_street = crashes_cbd.groupby('Crash_Street').agg({'Crash_Severity_n': 'mean',
                                                             'Crash_Latitude': 'mean',
                                                             'Crash_Longitude': 'mean',
                                                             'Crash_Year': 'count'}).reset_index()

# Rename the count column
grouped_data_street.rename(columns={'Crash_Year': 'Crash_Counts'}, inplace=True)

grouped_data_street
Out[14]:
Crash_Street Crash_Severity_n Crash_Latitude Crash_Longitude Crash_Counts
0 Adelaide St 0.398148 -27.465827 153.028257 36
1 Adeline La 0.000000 -27.466759 153.046530 1
2 Agnes St 0.666667 -27.457500 153.031129 1
3 Albert St 0.370370 -27.470371 153.025958 9
4 Alden St 0.444444 -27.456204 153.034901 3
... ... ... ... ... ...
209 Wellington Rd 0.333333 -27.482315 153.040692 2
210 Wharf St 0.166667 -27.464235 153.028877 2
211 Whynot St 0.666667 -27.484698 153.008798 1
212 Wickham St 0.348485 -27.454238 153.036995 22
213 Wickham Tce 0.200000 -27.464159 153.024535 5

214 rows × 5 columns

In [15]:
# Plot the points on a map using plotly.express
fig = px.scatter_mapbox(grouped_data_street, lat='Crash_Latitude', lon='Crash_Longitude', hover_name='Crash_Street',
                        size='Crash_Counts', color='Crash_Severity_n',
                        color_continuous_scale=['yellow','red','black'], size_max=30, zoom=12,
                       template="plotly")

# Customize the map layout
fig.update_layout(mapbox_style='stamen-terrain')
fig.update_layout(title='Number of Road Crashes by Street', height=600, width=800)


# Display the map
fig.show()

Note: Certain human factors are taken into mind when visualising. To Reduce bias, the yellow color has been chosen to visualise areas with less sever cases, this is because, giving colors such as white may let the reader to believe the street dosent have any crashes. The extreme end is giving black to emphasise areas with highly severe cases, as it is important to easily identify such areas

Takes from the Visualisation¶

  1. It can be seen that majority of the accidents happends in and around Pacific Highway followed by Ann St and Brunswick St.
  2. Almost all cases with higher severe outcomes happens in relativley small streets, with a intersections nearby.

Expanding on the ideas- What would be the reason?, How can this information help the concerned stakeholder?¶

To expand on the possible reasons for these patterns, the high frequency of accidents along the Pacific Highway may be attributed to its status as a major transportation route, resulting in heavy traffic congestion and increased chances of collisions. The BCC could consider implementing traffic management strategies such as widening the road, creating additional lanes, or introducing intelligent transportation systems to improve traffic flow and reduce congestion.

Factors such as inadequate signage, poor road conditions, or inadequate street lighting could contribute to the accident-prone nature of Ann St and Brunswick St. This could be also because of the added rush in the area, as the both the streets are hotspots that can gets easily busy on peak hours, and holiday seasons. Implementing traffic calming measures such as speed bumps or traffic islands could help reduce vehicle speeds and enhance safety, particularly during peak hours and holiday seasons.

Additionally, the concentration of severe outcomes in smaller streets with intersections suggests that the layout and design of these areas might contribute to more dangerous conditions. Potential factors could include inadequate visibility, insufficient traffic control measures, or inadequate space for vehicles to maneuver safely. The city council could prioritize enhancing the design and layout of these areas. This could involve conducting thorough traffic assessments to identify potential visibility issues and then implementing appropriate measures such as trimming vegetation obstructing sightlines or installing additional street lighting

Takeaway:¶

While these points may provide suggestions, it is inconclusive as to what are the factors that cause them, We can further analyse the issue by looking at the major hotspots around the city. Techniques such as Clustering could make us understand the nature of the the crashes and Regression can be used to identify the contributing factors for these crashes.

Note : Additional attempts were made to identify seasonal patterns and also performed clustering based on the road condition. However, it is exlcuded since the study became out of scope for this project.

In [ ]: